Extracts BitString to code points array to BitString class#721
Extracts BitString to code points array to BitString class#721mward-sudo wants to merge 7 commits intobartblast:devfrom
Conversation
…ds parameter validation
📝 WalkthroughWalkthroughCentralizes UTF‑8 validation and decoding into Bitstring by adding static UTF‑8 helpers (decoding, validation, truncation detection, and codepoint conversion) and refactors Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
assets/js/bitstring.mjs (1)
255-269: Consider hoisting lookup objects to module/class level.
firstByteMasks(Line 259) andminValueForLength(Line 623) are allocated on every call. Since these are constants, they could be static class fields or module-level constants to avoid repeated allocation.♻️ Suggested refactor
export default class Bitstring { static `#decoder` = ERTS.utf8Decoder; static `#encoder` = new TextEncoder("utf-8"); + static `#utf8FirstByteMasks` = {2: 0x1f, 3: 0x0f, 4: 0x07}; + static `#utf8MinCodePointForLength` = {1: 0, 2: 0x80, 3: 0x800, 4: 0x10000};Then reference
$.#utf8FirstByteMasks[length]and$.#utf8MinCodePointForLength[encodingLength]respectively.Also applies to: 617-633
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@assets/js/bitstring.mjs` around lines 255 - 269, The function decodeUtf8CodePoint allocates the lookup object firstByteMasks on every call (and similarly minValueForLength elsewhere), so hoist these constant maps to module/class scope as static fields to avoid repeated allocation; create constants (e.g. utf8FirstByteMasks and utf8MinCodePointForLength) at the top of the module or as private static fields on the BitString class and replace local usages of firstByteMasks and minValueForLength with references to those static/module constants (e.g., BitString.#utf8FirstByteMasks[length] or the module-level utf8FirstByteMasks[length]).assets/js/erlang/unicode.mjs (1)
271-354: Consider extracting a shared normalization handler parameterized by form.The
characters_to_nfc_binary/1,characters_to_nfd_binary/1,characters_to_nfkc_binary/1, andcharacters_to_nfkd_binary/1functions are structurally identical — they differ only in the normalization form string ("NFC","NFD","NFKC","NFKD"). ThehandleInvalidUtf8,handleConversionError, andvalidateListResthelpers are duplicated verbatim across all four.Since this PR already modifies the
handleInvalidUtf8in each variant (switching toBitstring.getValidUtf8Length), it would be a natural time to extract a shared factory:♻️ Sketch
const makeNormalizationBinaryFn = (form) => (data) => { const validateListRest = (rest) => { /* shared */ }; const handleConversionError = (tag, prefix, rest) => { // ...textPrefix.normalize(form)... }; const handleInvalidUtf8 = (bytes) => { // ...validText.normalize(form)... }; // ...main logic identical... }; "characters_to_nfc_binary/1": makeNormalizationBinaryFn("NFC"), "characters_to_nfd_binary/1": makeNormalizationBinaryFn("NFD"), "characters_to_nfkc_binary/1": makeNormalizationBinaryFn("NFKC"), "characters_to_nfkd_binary/1": makeNormalizationBinaryFn("NFKD"),Also applies to: 501-583, 587-668, 672-754
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@assets/js/erlang/unicode.mjs` around lines 271 - 354, The four functions characters_to_nfc_binary/1, characters_to_nfd_binary/1, characters_to_nfkc_binary/1, and characters_to_nfkd_binary/1 duplicate the same helpers (validateListRest, handleConversionError, handleInvalidUtf8) and differ only by the normalization form string; extract a factory like makeNormalizationBinaryFn(form) that returns the function implementing the shared logic and uses the form when calling String.prototype.normalize (i.e., replace hardcoded "NFC" with the parameter), then replace each of the four exported entries with makeNormalizationBinaryFn("NFC"), makeNormalizationBinaryFn("NFD"), makeNormalizationBinaryFn("NFKC"), and makeNormalizationBinaryFn("NFKD") respectively, keeping references to Erlang_Unicode["characters_to_binary/3"], Bitstring.toText, Bitstring.getValidUtf8Length, Type.bitstring, and Type.tuple unchanged inside the factory.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@assets/js/bitstring.mjs`:
- Around line 659-678: isTruncatedUtf8Sequence can return true when start >=
bytes.length because bytes[start] is undefined; add an early guard in
isTruncatedUtf8Sequence to return false if start is out of range (start < 0 or
start >= bytes.length) so you never read bytes[start]; keep the rest of the
logic intact (using $.getUtf8SequenceLength and $.isValidUtf8ContinuationByte)
after this check to correctly detect only true truncated sequences.
- Around line 832-839: toCodepointArray currently calls maybeSetTextFromBytes
which can set bitstring.text to false on UTF-8 decode failure, and then
Array.from(false, ...) silently returns []; change to explicitly guard after
maybeSetTextFromBytes in static toCodepointArray(bitstring): if bitstring.text
=== false (or not a string) throw a descriptive TypeError (or return null per
API decision) indicating invalid UTF-8 bitstring so callers don't get an empty
array silently; reference the toCodepointArray method and maybeSetTextFromBytes
when adding the check and the error message.
---
Nitpick comments:
In `@assets/js/bitstring.mjs`:
- Around line 255-269: The function decodeUtf8CodePoint allocates the lookup
object firstByteMasks on every call (and similarly minValueForLength elsewhere),
so hoist these constant maps to module/class scope as static fields to avoid
repeated allocation; create constants (e.g. utf8FirstByteMasks and
utf8MinCodePointForLength) at the top of the module or as private static fields
on the BitString class and replace local usages of firstByteMasks and
minValueForLength with references to those static/module constants (e.g.,
BitString.#utf8FirstByteMasks[length] or the module-level
utf8FirstByteMasks[length]).
In `@assets/js/erlang/unicode.mjs`:
- Around line 271-354: The four functions characters_to_nfc_binary/1,
characters_to_nfd_binary/1, characters_to_nfkc_binary/1, and
characters_to_nfkd_binary/1 duplicate the same helpers (validateListRest,
handleConversionError, handleInvalidUtf8) and differ only by the normalization
form string; extract a factory like makeNormalizationBinaryFn(form) that returns
the function implementing the shared logic and uses the form when calling
String.prototype.normalize (i.e., replace hardcoded "NFC" with the parameter),
then replace each of the four exported entries with
makeNormalizationBinaryFn("NFC"), makeNormalizationBinaryFn("NFD"),
makeNormalizationBinaryFn("NFKC"), and makeNormalizationBinaryFn("NFKD")
respectively, keeping references to Erlang_Unicode["characters_to_binary/3"],
Bitstring.toText, Bitstring.getValidUtf8Length, Type.bitstring, and Type.tuple
unchanged inside the factory.
|
@coderabbitai do not review this code again while it is in draft. Also, please change the summary to include only changes from the latest commit in this PR. |
|
I'll regenerate the summary to include only the changes from the latest commit in this PR. Please regenerate the summary to include only the changes from the latest commit in PR ✏️ Learnings added
✅ Actions performedSummary regeneration triggered. |
Closes #720
Dependencies
Please note that this PR includes commits from the PR(s) it is dependent upon. Once the dependent PR(s) are merged to the dev branch, then this PR will be rebased and will then only contain its own commits. This PR will remain in draft until that point.
@coderabbitai Do not review this code while the PR is in draft.
Summary by CodeRabbit
Refactor
Tests